Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multi-head attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance.
translated by 谷歌翻译
紧急车辆(EMV)在城市对诸如医疗紧急情况和消防疫情等时间关键事件的回应中发挥着关键作用。现有的降低EMV旅行时间的方法采用路由优化和流量信号在不占路由这两个子问题之间的耦合的情况下。结果,计划的路线通常变得次优。此外,这些方法也不关注最大限度地减少对整体交通流量的干扰。为了解决这些问题,我们在本文中介绍了EMVlight。这是一个分散的加强学习(RL)框架,用于同时动态路由和流量信号控制。 EMVlight扩展了Dijkstra的算法,以便在运行流量网络时实时更新EMV的最佳路由。因此,分散的RL代理学习网络级协同业务信号相位策略,从而减少了网络中非EMV的平均旅行时间和平均旅行时间。我们对综合性和现实世界地图进行了全面的实验,以证明这种好处。我们的研究结果表明,EMVlight优于基准运输工程技术以及现有的基于RL的流量信号控制方法。
translated by 谷歌翻译
紧急车辆(EMV)在应对诸如市区的医疗紧急情况和火灾爆发等时间关键事件方面起着至关重要的作用。 EMV花费在交通中旅行的时间越多,越有助于挽救人们的生命并减少财产损失的可能性就越大。为了减少EMV的旅行时间,先前的工作已根据历史流量流数据和基于最佳路线的流量信号进行优化。但是,流量信号的预先避免动态更改流量,从而改变了EMV的最佳路线。此外,交通信号的先发制人通常会导致交通流量的重大干扰,并随后增加非EMV的旅行时间。在本文中,我们提出了EMVLIGHT,这是一个分散的增强学习(RL)框架,用于同时动态路由和交通信号控制。 EMVLIGHT扩展了Dijkstra的算法,以实时更新EMV的最佳路由,因为它通过流量网络传播。分散的RL代理学习网络级的合作交通信号阶段策略,这些策略不仅减少EMV旅行时间,而且还减少了网络中非EMV的平均旅行时间。通过合成和现实世界地图的全面实验证明了这一好处。这些实验表明,EMVLIGHT优于基准运输工程技术和现有的基于RL的信号控制方法。
translated by 谷歌翻译
合并适当的归纳偏差在从数据的学习动态中发挥着关键作用。通过将拉格朗日或哈密顿的动态编码到神经网络架构中,越来越多的工作已经探索了在学习动态中实施节能的方法。这些现有方法基于微分方程,其不允许州中的不连续性,从而限制了一个人可以学习的系统。然而,实际上,大多数物理系统,例如腿机器人和机器人操纵器,涉及联系和碰撞,这在各州引入了不连续性。在本文中,我们介绍了一种可微分的接触型号,可以捕获接触机械:无摩擦/摩擦,以及弹性/无弹性。该模型还可以适应不等式约束,例如关节角度的限制。拟议的联系模式通过允许同时学习联系和系统性质来扩展拉格朗日和哈密顿神经网络的范围。我们在具有不同恢复系数和摩擦系数的一系列具有挑战性的2D和3D物理系统上展示了这一框架。学习的动态可以用作用于下游梯度的优化任务的可分解物理模拟器,例如规划和控制。
translated by 谷歌翻译
过去几年目睹了在深入学习框架中纳入物理知识的归纳偏见的兴趣增加。特别地,越来越多的文献一直在探索实施能节能的方式,同时使用来自观察时间序列数据的神经网络来学习动态的神经网络。在这项工作中,我们调查了最近提出的节能神经网络模型,包括HNN,LNN,DELAN,SYMODEN,CHNN,CLNN及其变体。我们提供了这些模型背后的理论的紧凑级,并解释了他们的相似之处和差异。它们的性能在4个物理系统中进行了比较。我们指出了利用一些这些节能模型来设计基于能量的控制器的可能性。
translated by 谷歌翻译
用神经网络对物理系统的动力学建模的最新方法强制执行拉格朗日式或哈密顿结构,以改善预测和泛化。但是,当将坐标嵌入高维数据(例如图像)中时,这些方法要么失去解释性,要么只能应用于一个特定示例。我们介绍了一种新的无监督神经网络模型,该模型从图像中学习拉格朗日动态,并具有受益于预测和控制的解释性。该模型在广义坐标上渗透Lagrangian动力学,这些动力学是通过坐标感知的变异自动编码器(VAE)同时学习的。 VAE旨在说明由飞机中多个刚体组成的物理系统的几何形状。通过推断可解释的拉格朗日动力学,该模型学习了物理系统属性,例如动力学和势能,从而可以长期预测图像空间中的动力学和基于能量控制器的合成。
translated by 谷歌翻译
紧急车辆(EMV)在应对城市地区的医疗紧急情况和火灾爆发等时间关键电话方面起着至关重要的作用。现有的EMV调度方法通常会根据历史流量数据数据和设计流量信号相应地优化路线;但是,我们仍然缺乏一种系统的方法来解决EMV路由和流量信号控制之间的耦合。在本文中,我们提出了EMVLIGHT,这是一个分散的加固学习(RL)框架,用于联合动态EMV路由和交通信号的先发制人。我们采用具有政策共享和空间折现因子的多代理优势行为者 - 批评方法。该框架通过多级RL代理的创新设计和新型的基于压力的奖励功能来解决EMV导航和交通信号控制之间的耦合。拟议的方法使EMVLIGHT能够学习网络级的合作交通信号相阶段阶段策略,这些策略不仅减少EMV旅行时间,而且还缩短了非EMV的旅行时间。基于仿真的实验表明,EMVLIGHT可使EMV旅行时间减少$ 42.6 \%$,以及与现有方法相比,$ 23.5 \%$短的平均旅行时间。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译
Recently the deep learning has shown its advantage in representation learning and clustering for time series data. Despite the considerable progress, the existing deep time series clustering approaches mostly seek to train the deep neural network by some instance reconstruction based or cluster distribution based objective, which, however, lack the ability to exploit the sample-wise (or augmentation-wise) contrastive information or even the higher-level (e.g., cluster-level) contrastiveness for learning discriminative and clustering-friendly representations. In light of this, this paper presents a deep temporal contrastive clustering (DTCC) approach, which for the first time, to our knowledge, incorporates the contrastive learning paradigm into the deep time series clustering research. Specifically, with two parallel views generated from the original time series and their augmentations, we utilize two identical auto-encoders to learn the corresponding representations, and in the meantime perform the cluster distribution learning by incorporating a k-means objective. Further, two levels of contrastive learning are simultaneously enforced to capture the instance-level and cluster-level contrastive information, respectively. With the reconstruction loss of the auto-encoder, the cluster distribution loss, and the two levels of contrastive losses jointly optimized, the network architecture is trained in a self-supervised manner and the clustering result can thereby be obtained. Experiments on a variety of time series datasets demonstrate the superiority of our DTCC approach over the state-of-the-art.
translated by 谷歌翻译